AITopics

Genre: Research Report > New Finding (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsNov-21-2025, 08:07:01 GMT

The Numerics of GANs

In this paper, we analyze the numerics of common algorithms for training Generative Adversarial Networks (GANs).

artificial intelligence, eigenvalue, machine learning, (15 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Oceania > Australia > New South Wales > Sydney (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Brooks, Robyn, Masden, Marissa

Combinatorial Regularity for Relatively Perfect Discrete Morse Gradient Vector Fields of ReLU Neural Networks

arXiv.org Artificial IntelligenceDec-23-2024

One common function class in machine learning is the class of ReLU neural networks. ReLU neural networks induce a piecewise linear decomposition of their input space called the canonical polyhedral complex. It has previously been established that it is decidable whether a ReLU neural network is piecewise linear Morse. In order to expand computational tools for analyzing the topological properties of ReLU neural networks, and to harness the strengths of discrete Morse theory, we introduce a schematic for translating between a given piecewise linear Morse function (e.g. parameters of a ReLU neural network) on a canonical polyhedral complex and a compatible (``relatively perfect") discrete Morse function on the same complex. Our approach is constructive, producing an algorithm that can be used to determine if a given vertex in a canonical polyhedral complex corresponds to a piecewise linear Morse critical point. Furthermore we provide an algorithm for constructing a consistent discrete Morse pairing on cells in the canonical polyhedral complex which contain this vertex. We additionally provide some new realizability results with respect to sublevel set topology in the case of shallow ReLU neural networks.

artificial intelligence, machine learning, vertex, (18 more...)

2412.18005

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceNov-6-2024

On the Decomposition of Differential Game

Zhou, Nanxiang, Dong, Jing, Li, Yutian, Wang, Baoxiang

To understand the complexity of the dynamic of learning in differential games, we decompose the game into components where the dynamic is well understood. One of the possible tools is Helmholtz's theorem, which can decompose a vector field into a potential and a harmonic component. This has been shown to be effective in finite and normal-form games. However, applying Helmholtz's theorem by connecting it with the Hodge theorem on $\mathbb{R}^n$ (which is the strategy space of differential game) is non-trivial due to the non-compactness of $\mathbb{R}^n$. Bridging the dynamic-strategic disconnect through Hodge/Helmoltz's theorem in differential games is then left as an open problem \cite{letcher2019differentiable}. In this work, we provide two decompositions of differential games to answer this question: the first as an exact scalar potential part, a near vector potential part, and a non-strategic part; the second as a near scalar potential part, an exact vector potential part, and a non-strategic part. We show that scalar potential games coincide with potential games proposed by \cite{monderer1996potential}, where the gradient descent dynamic can successfully find the Nash equilibrium. For the vector potential game, we show that the individual gradient field is divergence-free, in which case the gradient descent dynamic may either be divergent or recurrent.

decomposition, potential game, vector potential game, (14 more...)

2411.03802

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Neural Information Processing SystemsOct-3-2024, 07:57:54 GMT

4588e674d3f0faf985047d4c3f13ed0d-Paper.pdf

In this paper, we analyze the numerics of common algorithms for training Generative Adversarial Networks (GANs). Using the formalism of smooth two-player games we analyze the associated gradient vector field of GAN training objectives. Our findings suggest that the convergence of current algorithms suffers due to two factors: i) presence of eigenvalues of the Jacobian of the gradient vector field with zero real-part, and ii) eigenvalues with big imaginary part. Using these findings, we design a new algorithm that overcomes some of these limitations and has better convergence properties. Experimentally, we demonstrate its superiority on training common GAN architectures and show convergence on GAN architectures that are known to be notoriously hard to train.

algorithm, eigenvalue, gradient ascent, (14 more...)

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
Oceania > Australia > New South Wales > Sydney (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

arXiv.org Artificial IntelligenceApr-6-2024

Towards Analyzing and Understanding the Limitations of DPO: A Theoretical Perspective

Feng, Duanyu, Qin, Bowen, Huang, Chen, Zhang, Zheng, Lei, Wenqiang

Direct Preference Optimization (DPO), which derives reward signals directly from pairwise preference data, has shown its effectiveness on aligning Large Language Models (LLMs) with human preferences. Despite its widespread use across various tasks, DPO has been criticized for its sensitivity to the SFT's effectiveness and its hindrance to the learning capacity towards human-preferred responses, leading to less satisfactory performance. To overcome those limitations, the theoretical understanding of DPO are indispensable but still lacking. To this end, we take a step towards theoretically analyzing and understanding the limitations of DPO. Specifically, we provide an analytical framework using the field theory to analyze the optimization process of DPO. By analyzing the gradient vector field of the DPO loss function, we find that the DPO loss function decreases the probability of producing human dispreferred data at a faster rate than it increases the probability of producing preferred data. This provides theoretical insights for understanding the limitations of DPO discovered in the related research experiments, thereby setting the foundation for its improvement.

dpo, llm, probability, (13 more...)

2404.04626

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Artificial IntelligenceMar-19-2024

Some geometric and topological data-driven methods in robot motion path planning

Goldfarb, Boris

Motion path planning is an intrinsically geometric problem which is central for design of robot systems. Since the early years of AI, robotics together with computer vision have been the areas of computer science that drove its development. Many questions that arise, such as existence, optimality, and diversity of motion paths in the configuration space that describes feasible robot configurations, are of topological nature. The recent advances in topological data analysis and related metric geometry, topology and combinatorics have provided new tools to address these engineering tasks. We will survey some questions, issues, recent work and promising directions in data-driven geometric and topological methods with some emphasis on the use of discrete Morse theory.

discrete morse theory, morse theory, vector field, (16 more...)

2403.12725

Country:

North America > United States > New York > Albany County > Albany (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(6 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

Hamm, Keaton, Moosmüller, Caroline, Schmitzer, Bernhard, Thorpe, Matthew

Manifold learning in Wasserstein space

arXiv.org Machine LearningNov-14-2023

This paper aims at building the theoretical foundations for manifold learning algorithms in the space of absolutely continuous probability measures on a compact and convex subset of $\mathbb{R}^d$, metrized with the Wasserstein-2 distance $W$. We begin by introducing a natural construction of submanifolds $\Lambda$ of probability measures equipped with metric $W_\Lambda$, the geodesic restriction of $W$ to $\Lambda$. In contrast to other constructions, these submanifolds are not necessarily flat, but still allow for local linearizations in a similar fashion to Riemannian submanifolds of $\mathbb{R}^d$. We then show how the latent manifold structure of $(\Lambda,W_{\Lambda})$ can be learned from samples $\{\lambda_i\}_{i=1}^N$ of $\Lambda$ and pairwise extrinsic Wasserstein distances $W$ only. In particular, we show that the metric space $(\Lambda,W_{\Lambda})$ can be asymptotically recovered in the sense of Gromov--Wasserstein from a graph with nodes $\{\lambda_i\}_{i=1}^N$ and edge weights $W(\lambda_i,\lambda_j)$. In addition, we demonstrate how the tangent space at a sample $\lambda$ can be asymptotically recovered via spectral analysis of a suitable "covariance operator" using optimal transport maps from $\lambda$ to sufficiently close and diverse samples $\{\lambda_i\}_{i=1}^N$. The paper closes with some explicit constructions of submanifolds $\Lambda$ and numerical examples on the recovery of tangent spaces through spectral analysis.

artificial intelligence, machine learning, optimal transport map, (16 more...)

arXiv.org Machine Learning

2311.08549

Country:

North America > United States > Texas (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry:

Education (0.84)
Government (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Mescheder, Lars, Nowozin, Sebastian, Geiger, Andreas

The Numerics of GANs

Neural Information Processing SystemsFeb-14-2020, 08:58:46 GMT

In this paper, we analyze the numerics of common algorithms for training Generative Adversarial Networks (GANs). Using the formalism of smooth two-player games we analyze the associated gradient vector field of GAN training objectives. Our findings suggest that the convergence of current algorithms suffers due to two factors: i) presence of eigenvalues of the Jacobian of the gradient vector field with zero real-part, and ii) eigenvalues with big imaginary part. Using these findings, we design a new algorithm that overcomes some of these limitations and has better convergence properties. Experimentally, we demonstrate its superiority on training common GAN architectures and show convergence on GAN architectures that are known to be notoriously hard to train.

gan architecture, gradient vector field, numeric, (3 more...)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.83)